To solve the problems in face editing, such as unnatural editing results and great changes in generated images, a controllable face editing algorithm with closed-form solution was proposed. Firstly, n latent vectors were sampled randomly to construct a sample matrix, and the top k principal component vectors of the matrix were calculated. Then, five attributes of face image were obtained by ResNet-50, and the semantic boundary of each attribute was calculated by Support Vector Machine (SVM). Finally, the interpretable direction vectors of these attributes were calculated, which were as closed to the principal components vectors as possible and stayed as far away from the semantic boundary of the corresponding attribute as possible at the same time, thereby reducing the coupling between facial attributes, and improving the controllability in face editing. Because the algorithm has a closed-form solution, it has high efficiency. Experimental results show that the compared with closed-form Factorization of latent Semantics in GANs (SeFa) algorithm and Discovering Interpretable Generative Adversarial Network Controls (GANSpace) algorithm, the proposed algorithm increases the Inception Score (IS) by 19% and 26% respectively, decreases the Fréchet Inception Distance (FID) by 4% and 37% respectively, and decreases the Maximum Mean Discrepancy (MMD) by 15% and 48% respectively. It can be seen that this algorithm has good controllability and decoupling.
Aiming at the problems that the deep spectral clustering models perform poorly in training stability and generalization capability, a Deep Spectral Clustering algorithm with L1 Regularization (DSCLR) was proposed. Firstly, L1 regularization was introduced into the objective function of deep spectral clustering to sparsify the eigen vectors of the Laplacian matrix generated by the deep neural network model. And the generalization capability of the model was enhanced. Secondly, the network structure of the spectral clustering algorithm based on deep neural network was improved by using the Parametric Rectified Linear Unit activation function (PReLU) to solve the problems of model training instability and underfitting. Experimental results on MNIST dataset show that the proposed algorithm improves Clustering Accuracy (CA), Normalized Mutual Information (NMI) index, and Adjusted Rand Index (ARI) by 11.85, 7.75, and 17.19 percentage points compared to the deep spectral clustering algorithm, respectively. Furthermore, the proposed algorithm also significantly improves the three evaluation metrics, CA, NMI and ARI, compared to algorithms such as Deep Embedded Clustering (DEC) and Deep Spectral Clustering using Dual Autoencoder Network (DSCDAN).
Group activity recognition is a challenging task in complex scenes, which involves the interaction and the relative spatial position relationship of a group of people in the scene. The current group activity recognition methods either lack the fine design or do not take full advantage of interactive features among individuals. Therefore, a network framework based on partitioned attention mechanism and interactive position relationship was proposed, which further considered individual limbs semantic features and explored the relationship between interaction feature similarity and behavior consistency among individuals. Firstly, the original video sequences and optical flow image sequences were used as the input of the network, and a partitioned attention feature module was introduced to refine the limb motion features of individuals. Secondly, the spatial position and interactive distance were taken as individual interaction features. Finally, the individual motion features and spatial position relation features were fused as the features of the group scene undirected graph nodes, and Graph Convolutional Network (GCN) was adopted to further capture the activity interaction in the global scene, thereby recognizing the group activity. Experimental results show that this framework achieves 92.8% and 97.7% recognition accuracy on two group activity recognition datasets (CAD (Collective Activity Dataset) and CAE (Collective Activity Extended Dataset)). Compared with Actor Relationship Graph (ARG) and Confidence Energy Recurrent Network (CERN) on CAD dataset, this framework has the recognition accuracy improved by 1.8 percentage points and 5.6 percentage points respectively. At the same time, the results of ablation experiment show that the proposed algorithm achieves better recognition performance.
A novel kernel-based non-negative sparse representation (KNSR) method was presented for face recognition. The contributions were mainly three aspects: First, the non-negative constraints on representation coefficients were introduced into the Sparse Representation (SR) and the kernel function was exploited to depict non-linear relationships among different samples, based on which the corresponding objective function was proposed. Second, a multiplicative gradient descent method was proposed to solve the proposed objective function, which could achieve the global optimum value in theory. Finally, local binary feature and the Hamming kernel were used to model the non-linear relationships among face samples and therefore achieved robust face recognition. The experimental results on some challenging face databases demonstrate that the proposed algorithm has higher recognition rates in comparison with algorithms of Nearest Neighbor (NN), Support Vector Machine (SVM), Nearest Subspace (NS), SR and Collaborative Representation (CR), and achieves about 99% recognition rates on both YaleB and AR databases.